Finding the K best policies in a finite-horizon Markov decision process
نویسندگان
چکیده
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered. In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.
منابع مشابه
Finding the K best policies in finite- horizon Markov decision processes
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered. In this paper we consider finite-horizon ...
متن کاملFinite-Horizon Markov Decision Processes with State Constraints
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...
متن کاملFinite-Horizon Markov Decision Processes with Sequentially-Observed Transitions
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize costs) in a given stochastic dynamical environment. In this paper, we extend this model by incorporating additional information that the transitions due to act...
متن کاملNon-randomized policies for constrained Markov decision processes
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by nonrandomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.
متن کاملLoss Bounds for Uncertain Transition Probabilities in Markov Decision Processes
We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are pre-computed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Our approach analyzes the growth of errors incurred by stepping backwa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- European Journal of Operational Research
دوره 175 شماره
صفحات -
تاریخ انتشار 2006